89 research outputs found

    Mining Complex Hydrobiological Data with Galois Lattices

    Get PDF
    We have used Galois lattices for mining hydrobiological data. These data are about macrophytes, that are macroscopic plants living in water bodies. These plants are characterized by several biological traits, that own several modalities. Our aim is to cluster the plants according to their common traits and modalities and to find out the relations between traits. Galois lattices are efficient methods for such an aim, but apply on binary data. In this article, we detail a few approaches we used to transform complex hydrobiological data into binary data and compare the first results obtained thanks to Galois lattices

    Bag-of-word based brand recognition using Markov Clustering Algorithm for codebook generation

    No full text
    International audienceIn order to address the issue of counterfeiting online, it is necessary to use automatic tools that analyze the large amount of information available over the Internet. Analysis methods that extract information about the content of the images are very promising for this purpose. In this paper, a method that automatically extract the brand of objects in images is proposed. The method does not explicitly search for text or logos. This information is implicitly included in the Bag-of-Words representation. In the Bag-of-Words paradigm, visual features are clustered to create the visual words. Despite its shortcomings, k-means is the most widely used algorithm. With k-means, the selection of the number of visual words is critical. In this paper, another clustering algorithm is proposed. Markov Cluster Algorithm (MCL) is very fast, does not require an arbitrary selection of the number of classes and does not rely on random initialization. First, we demonstrate in this paper that MCL is competitive to k-means with a number of cluster experimentally selected. Second, we show that it is possible to identify brand from objects in images without previous knowledge about visual identity of these brands

    Reconstruction et analyse sémantique de chronologies cybercriminelles

    No full text
    International audienceLa reconstruction de scénarios est l’une des étapes les plus importantes d’une investigation numérique. Elle permet aux enquêteurs d’avoir une vue des évènements survenus durant un incident. La reconstruction de scénarios est une tâche complexe requérant l’étude d’un très grand nombre d’évènements en raison de l’omniprésence des nouvelles technologies dans notre quotidien. De plus, les conclusions produites se doivent de respecter les critères fixés par la justice. Afin de répondre à ces challenges, nous proposons une nouvelle méthodologie, basée sur une ontologie intégrant les connaissances d’experts des domaines de la criminalistique et de l’ingénierie logicielle, permettant d’assister les enquêteurs tout au long du processus d’enquête

    Mining Complex Hydrobiological Data with Galois Lattices

    Get PDF
    International audienceWe used Galois lattices for mining hydrobiological data about macrophytes, i.e. macroscopic plants living in water bodies. These plants are characterized by several biological traits, that are divided into several modalities. Our aim was to cluster the plants according to their common traits and modalities and to find out the relations between the traits. Galois lattices are efficient methods for such an aim, but apply to binary data. In this article, we detail a few of the approaches we used to turn complex hydrobiological data into binary data and compare the first results obtained thanks to Galois lattices

    Hierarchical Multi-Label Classification Using Web Reasoning for Large Datasets

    Get PDF
    Extracting valuable data among large volumes of data is one of the main challenges in Big Data. In this paper, a Hierarchical Multi-Label Classification process called Semantic HMC is presented. This process aims to extract valuable data from very large data sources, by automatically learning a label hierarchy and classifying data items.The Semantic HMC process is composed of five scalable steps, namely Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically a label hierarchy from statistical analysis of data. This paper focuses on the last two steps which perform item classification according to the label hierarchy. The process is implemented as a scalable and distributed application, and deployed on a Big Data platform. A quality evaluation is described, which compares the approach with multi-label classification algorithms from the state of the art dedicated to the same goal. The Semantic HMC approach outperforms state of the art approaches in some areas

    Towards a Framework for Semantic Exploration of Frequent Patterns

    Get PDF
    http://ceur-ws.org/Vol-1075/ - ISSN: 1613-0073International audienceMining frequent patterns is an essential task in discovering hidden correlations in datasets. Although frequent patterns unveil valuable information, there are some challenges which limits their usability. First, the number of possible patterns is often very large which hinders their eff ective exploration. Second, patterns with many items are hard to read and the analyst may be unable to understand their meaning. In addition, the only available information about patterns is their support, a very coarse piece of information. In this paper, we are particularly interested in mining datasets that reflect usage patterns of users moving in space and time and for whom demographics attributes are available (age, occupation, etc). Such characteristics are typical of data collected from smart phones, whose analysis has critical business applications nowadays. We propose pattern exploration primitives, abstraction and refinement, that use hand-crafted taxonomies on time, space and user demographics. We show on two real datasets, Nokia and MovieLens, how the use of such taxonomies reduces the size of the pattern space and how demographics enable their semantic exploration. This work opens new perspectives in the semantic exploration of frequent patterns that reflect the behavior of di fferent user communities

    Une ontologie de la culture de la vigne : des savoirs académiques aux savoirs d'expérience

    Get PDF
    Dans le cadre d’un projet FUI initié en octobre 2016 (projet winecloud) visant à construire un outil de traçabilité et prédictif du cycle de la vigne et du vin, un travail sur la collecte et la nature des savoirs a été nécessaire de manière à penser un système ontologique qui se rapproche le plus du raisonnement du domaine métier. Le présent article vise plus spécifiquement à étudier le cycle de vie de la vigne. Nous rendons compte que les savoirs académiques présents dans les sources théoriques et scientifiques s’ajustent, se réactualisent à la lumière des savoirs d’expérience des viticulteurs. Ce travail s’attache également à analyser la nature protéiforme des savoirs d’expérience et à rendre compte de leur pluralité.Dans le cadre d’un projet FUI initié en octobre 2016 (projet winecloud) visant à construire un outil de traçabilité et prédictif du cycle de la vigne et du vin, un travail sur la collecte et la nature des savoirs a été nécessaire de manière à penser un système ontologique qui se rapproche le plus du raisonnement du domaine métier. Le présent article vise plus spécifiquement à étudier le cycle de vie de la vigne. Nous rendons compte que les savoirs académiques présents dans les sources théoriques et scientifiques s’ajustent, se réactualisent à la lumière des savoirs d’expérience des viticulteurs. Ce travail s’attache également à analyser la nature protéiforme des savoirs d’expérience et à rendre compte de leur pluralité

    Galois lattices for fuzzy many-valued contexts,application to life traits study in hydrobiology

    No full text
    Cette thèse en informatique se place dans le cadre de l'Analyse de Concepts Formels (ACF) et s'intéresse à des contextes complexes (multi-valués flous) dont la complexité repose sur deux axes. D une part, les contextes multi-valués dont les attributs se divisent en plusieurs modalités. D autre part, les contextes flous dont la relation entre objets et attributs n est pas binaire. Nous présentons deux conversions des données multi-valuées floues. La première est une binarisation par une disjonction totale des attributs permettant l'exploitation d'implications et de comparer et combiner les treillis avec des méthodes statistiques telles que l'analyse factorielle. La seconde conversion est issue de l'échelonnage histogramme que nous définissons et qui permet de convertir les attributs en histogrammes. Afin de générer les concepts à partir des histogrammes, nous proposons une nouvelle fermeture de Galois basée sur une mesure de similarité entre ces histogrammes. Cette fermeture permet d'obtenir des concepts pour lesquels les objets possèdent des attributs non plus égaux mais similaires compris entre un minimum et un maximum communs. Nous proposons également des mesures de seuillage pour limiter le nombre de concepts générés et diminuer les temps de calculs. Enfin, deux algorithmes ont été testés pour implémenter cette fermeture : MinMaxNC et MinMaxC, dont nous comparons les performances. Cette thèse trouve son application notamment dans le domaine hydrobiologique dont une problématique est la sélection de traits écologiques de taxons permettant de caractériser l'état écologique des cours d'eau par le comportement des espèces au sein de leur environnement.This computer information science PhD takes place in the framework of Formal Concept Analysis (FCA) or Galois lattices, which are tools based on mathematical operators called Galois connections allowing to generate concepts. A concept is composed with a set of objects sharing a set of attributes. These concepts are generated from a context which is a table of binary relations between these objects and these attributes. We are interested in complex contexts for which the complexity is based on two elements. On one hand, on many-valued context for which the attributes are divided into several modalities. On the other hand, it is based on fuzzy contexts for which the relation between objects and attributes is not binary. We define fuzzy many-valued contexts which inherit of both complexities and introduce two conversions for fuzzy many-valued data. The first conversion is a binarisation by a complete disjonctive operation allowing to use tools such as implications and to compare and combine lattices with statistical methods such as factorial analysis. The second conversion is issued from histogram scaling which we define and which converts attributes into histograms. To generate concepts from histograms, we propose new Galois connections based on a similaritymeasure between these histograms. These connections allow to obtain concepts where objects share attributes which are not equal but similar between the same minimum and maximum. We also propose to use thresholds to limit the number of generated concepts and decrease calculating time.We have tested and compared the performance of two algorithms : MinMaxNC and MinMaxC implementing this connection. This PhD is applied to the hydrobiological domain for which it is needed to select ecological traits allowing to caracterize ecological quality of water surfaces due to the behaviour of species in their environment. The selection of these traits is based on the search of groups of taxons sharing morphological and physiological (called biological traits) characteristics. These groups correspond to concepts in FCA and biological data can be considered as fuzzy many-valued context for which we show the efficency of our approach
    • …
    corecore